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Abstract.  In  signal  acquisition,  Toeplitz  and  circulant  matrices  are  widely  used  as  sensing  operators.  They  correspond 
to  discrete  convolutions  and  are  easily  or  even  naturally  realized  in  various  applications.  For  compressive  sensing,  recent  work 
has  used  random  Toeplitz  and  circulant  sensing  matrices  and  proved  their  efficiency  in  theory,  by  computer  simulations,  as 
well  as  through  physical  optical  experiments.  Motivated  by  recent  work  [7],  we  propose  models  to  learn  a  circulant  sensing 
matrix/operator  for  one  and  higher  dimensional  signals.  Given  the  dictionary  of  the  signal(s)  to  be  sensed,  the  learned  circulant 
sensing  matrix/operator  is  more  effective  than  a  randomly  generated  circulant  sensing  matrix/operator,  and  even  slightly  so 
than  a  Gaussian  random  sensing  matrix.  In  addition,  by  exploiting  the  circulant  structure,  we  improve  the  learning  from  the 
patch  scale  in  [7]  to  the  much  large  image  scale.  Furthermore,  we  test  learning  the  circulant  sensing  matrix/operator  and  the 
nonparametric  dictionary  altogether  and  obtain  even  better  performance.  We  demonstrate  these  results  using  both  synthetic 
sparse  signals  and  real  images. 
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dictionary  learning 


1.  Introduction.  Compressive  sensing  (CS)  ([6,  5])  acquires  a  compressible  signal  from  a  small  number 
of  linear  projections.  Let  x  denote  an  n-dimensional  real  or  complex  vector  that  is  sparse  under  a  certain 
basis  ’I',  i.e. ,  one  can  write  x  =  ’L(9  with  a  sparse  6.  Let  b  :=  <J>x  represent  a  set  of  m  linear  projections  of  x. 
The  basis  pursuit  problem 


BP:  min||0||i  s.t.  <E”3/0  =  b,  (1.1) 

as  well  as  several  other  methods,  has  been  known  to  return  a  sparse  vector  9  and  thus  recover  x  =  $0  under 
certain  conditions  on  the  sensing  matrix  $  and  basis  T. 

In  many  CS  applications,  the  acquisition  of  the  linear  projections  <h.T  requires  a  physical  implementation. 
In  most  cases,  the  use  of  an  i.i.d.  Gaussian  random  matrix  <I>  is  either  impossible  or  overly  expensive.  This 
motivates  the  study  of  easily  implementable  CS  matrices.  Two  types  of  such  matrices  are  the  Toeplitz  and 
circulant  matrices,  which  have  been  shown  to  be  almost  as  effective  as  the  Gaussian  random  matrix  for  CS 
encoding/decoding.  Toeplitz  and  circulant  matrices  have  the  forms 
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respectively.  When  matrix  T  satisfies  the  additional  property  that  tt  =  in+i,V*,  it  becomes  a  circulant 
matrix  C.  Since  a  (partial1)  Toeplitz  matrix  has  very  similar  theoretical  and  computational  properties  to 
a  (partial)  circulant  matrix  of  the  same  size,  our  discussions  below  are  based  exclusively  on  the  circulant 
matrix.  Using  Toeplitz,  rather  than  circulant,  matrices  will  incur  some  insignificant  computation  overhead 
to  the  methods  proposed  in  this  paper. 

1.1.  Circulant  Compressive  Sensing.  In  various  physical  domains,  it  is  easy  to  realize  Cx  since  it  is 
equivalent  to  the  discrete  convolution  c*x  for  a  certain  vector  c;  hence,  if  P  is  a  row-selection  (downsampling) 
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1By  “a  partial  matrix”,  we  refer  to  a  submatrix  formed  by  selecting  m  out  of  the  n  rows  of  the  original  matrix,  where 
m  <  n.  Given  a  row  selection  operator  P  and  an  original  matrix  M,  PM  is  a  partial  matrix. 
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operator,  PCx  becomes  circulant  CS  measurements  of  x.  Using  either  Toeplitz  or  circulant  matrices,  Tropp 
et  dZ.  [24]  describes  a  random  filter  for  acquiring  a  signal  x;  Haupt  et  oZ.[10]  describes  a  channel  estimation 
problem  to  identify  a  vector  x  (called  impulse  responses)  that  characterizes  a  discrete  linear  time-invariant 
(LTI)  system;  Meng  et  aZ.[18,  19]  applies  it  to  high-resolution  OFDM  channel  estimation.  Random  con¬ 
volutions  can  also  be  applied  in  some  imaging  systems  in  which  convolutions  either  naturally  arise  or  can 
be  physically  realized  [3,  12,  16,  15,  23].  Furthermore,  random  convolutions  can  be  realized  by  an  optical 
correlator  [23].  Since  any  matrix  C  in  (1.2)  can  be  diagonalized  by  a  Fourier  transform,  i.e.,  obeying 

C  =  FDF *,  (1.3) 

where  F  is  the  discrete  Fourier  matrix  of  the  same  size  as  C  and  D  is  a  diagonal  matrix,  implementing  Cx 
is  equivalent  to  implementing  FDF*  x,  which  can  be  realized  through  optical  lens  and  a  static  spatial  light 
modulator  (SLM).  Recently,  the  use  of  Toeplitz  and  circulant  matrices  has  been  proposed  for  compressive 
MR  imaging  by  Liang  et  al.  [13].  A  good  review  can  be  found  in  [26]. 

There  are  rich  theoretical  results  on  circulant  matrices  for  CS.  In  [2],  Toeplitz  measurement  matrices  are 
constructed  with  i.i.d.  random  entries  of  ±1  or  {—1,  0, 1};  their  downsampling  effectively  selects  the  first  m 
rows;  and  the  number  of  measurements  needed  for  stable  l\  recovery  is  shown  to  be  m  >  0(fc3  dogn/fc),  where 
k  is  the  signal  sparsity.  [11]  selects  the  first  m  rows  of  a  Toeplitz  matrix  with  i.i.d.  Bernoulli  or  Gaussian 
entries  for  sparse  channel  estimation.  Their  scheme  requires  m  >  0(k2  ■  logn)  for  stable  recovery.  The 
work  [19]  establishes  stable  recovery  under  the  condition  m  >  0(k2  log(n/fc)).  In  [23],  random  convolution 
with  either  random  downsampling  or  random  demodulation  is  proposed  and  studied.  It  is  shown  that  the 
resulting  measurement  matrix  is  incoherent  with  any  given  sparse  basis  with  high  probability  and  i\  recovery 
is  stable  given  m  >  0(k  •  logn  +  log3  n).  Recent  results  in  [22]  show  that  several  random  circulant  matrices 
satisfy  the  restricted  isometry  property  (RIP)  in  expectation  given  m  >  0(max{s3/2  log3/2  n,  k  log2  k  log2  n}) 
with  arbitrary  downsampling.  We  note  that  all  these  results  are  based  on  random  circulant  matrices,  so  they 
do  not  apply  to  optimized  circulant  matrices  in  this  paper.  We  demonstrate  that  learned  circulant  matrices 
achieve  even  better  performance. 

The  use  of  circulant  sensing  matrices  also  allows  faster  signal  and  image  recovery.  Practical  algorithms 
(e.g.,  [25,  27,  28,  31,  29])  for  CS  are  based  on  performing  operations  including  multiplications  involving  with 
>I>  and  $*.  For  partial  circulant  matrix  d>  =  PC,  $x  and  <E>*y  each  can  be  quickly  computed  by  two  fast 
Fourier  transforms  (FFTs)  and  simple  component-wise  operations.  This  is  much  cheaper  than  multiplying 
a  general  matrix  with  a  vector.  For  image  recovery,  a  splitting  algorithm  taking  advantages  of  the  circulant 
structure  has  been  proposed  in  [30]  and  shows  both  satisfactory  recovery  quality  and  speed.  These  fast 
algorithms  apply  to  the  learned  circulant  matrices  in  this  paper. 

1.2.  Learning  Dictionaries  and  Sensing  Matrices.  In  CS,  signal  and  image  reconstructions  are 
based  on  how  they  are  sparsely  represented.  The  sparse  representation  involves  a  choice  of  dictionary,  a 
set  of  elementary  signals  (or  atoms)  used  to  sparsely  decompose  the  underlying  signal  or  image.  There 
are  analytic  dictionaries  and  learned  dictionaries.  Examples  of  analytic  dictionaries  include  the  discrete 
cosine  basis,  various  wavelets  bases,  as  well  as  tight  frames.  Some  of  them  are  orthogonal  while  others 
are  over-complete.  Their  analytic  properties  have  been  studied,  and  they  feature  fast  implementations; 
hence,  they  have  found  wide  applications.  Properly  learned  (as  opposed  to  analytic)  bases  can  give  rise 
to  even  sparser  representations  of  signals  and,  in  particular,  images,  so  they  can  give  better  encoding  and 
decoding  performance  than  the  analytic  dictionaries;  see  [21,  9,  20]  for  examples  and  explanations.  Although 
most  theoretical  results  of  CS  recovery  do  not  apply  to  learned  dictionaries  and  optimized  sensing  matrices, 
one  useful  tool  is  the  so-called  mutual  coherence  between  a  dictionary  T  and  a  sensing  matrix  with 
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(1.4) 


D  :=  $'1'  =  [g?i,  ■  ■  ■ ,  dx\,  it  is  defined  as  [14] 


\d*dj\ 

W^hWdjh' 


A  smaller  n(D)  tends  to  allow  more  4/-sparse  signals  x  to  be  successfully  recovered  from  measurements 
4>ai  via  various  CS  algorithms.  Hence,  Elad  [8]  seeks  means  to  reduce  /x( D ).  He  considers  the  Gram 
matrix  G  =  D* D  and  proposes  to  reduce  //*(£>)  :=  (E^yi^xif  (fti  >  t)  ■  \gij |)/(Ei^,i<i j<ic(lfcl  >  *))• 
His  work  demonstrates  improved  recovery  quality  with  learned  (non-circulant)  sensing  matrices  <5,  and  it 
has  motivated  the  subsequent  work  [7].  Work  [7]  is  not  based  on  minimizing  /i(D)  but  instead  pursuing 
($4')* ($4')  S3  /,  where  I  denotes  the  identity  matrix.  The  results  of  [7]  are  even  better  since  p(D)  is  a 
worst-case  characteristic  whereas  (4>4')*(<f>4')  ss  /  is  more  toward  improving  the  overall  performance.  Also, 
the  latter  is  easier  to  compute. 


1.3.  Contributions.  We  are  motivated  by  [7]  and  propose  numerical  methods  to  minimize  ||  (4>4')*(4>4/)  — 
I\\f,  where  <f>  is  either  a  full  or  partial  circulant  matrix.  Like  [7],  we  also  learn  $  and  4>  together,  performing 
the  so-called  the  coupled  learning  of  4>  and  4'.  While  results  of  [7]  are  limited  to  one  dimensional  case,  we 
take  advantages  of  the  circulant  structure  and  deal  with  more  than  one  dimension,  enabling  the  learned 
circulant  sensing  for  signals  such  as  images  and  videos.  Current  atoms  are  patches,  e.g.,  of  size  8x8.  We 
further  address  the  patch-scale  limitation  of  [7],  namely,  the  sensing  matrix  size  n  must  equal  the  dictionary 
atom  size;  namely,  if  the  dictionary  for  a  512  x  512  image  is  formed  by  8  x  8  patches,  the  sensing  matrix 
generated  in  [7]  has  only  64  =  8  x  8  columns  and  applies  to  vectors  of  length  64,  instead  of  512  x  512.  Hence, 
the  learned  sensing  matrix  cannot  be  applied  to  the  entire  image.  We  remove  this  limitation  and  perform 
image-scale  learning  by  generating  circulant  sensing  operators  applicable  to  the  entire  signals  or  images. 

Our  approaches  are  tested  on  synthetic  ID  signals,  as  well  as  the  images  in  the  Berkeley  segmentation 
dataset  [17].  The  learned  circulant  sensing  matrix  gives  better  recoverability  over  both  random  circulant 
and  Gaussian  random  matrices.  For  real  image  tests,  the  coupled  learning  approach  achieves  even  better 
recovery  performance. 

1.4.  Notation.  We  let  (-)T  and  (•)*  denote  transpose  and  conjugate  transpose ,  respectively,  conj(-) 

stands  for  the  conjugate  operator.  Define  A  •  B  =  E,  j  and  (A  B)  =  E;  j  conj (Aij)Bij  for  any  two 

matrices  A,  B  of  the  same  dimension.  vec(A)  is  a  vector  formed  by  stacking  all  the  columns  of  A,  and  diag(-) 
is  defined  in  the  same  way  as  MATLAB  function  diag,  which  either  extracts  the  diagonal  entries  of  a  given 
matrix  to  form  a  vector  or,  given  a  vector,  forms  a  diagonal  matrix  with  the  vector’s  entries.  In  addition,  © 
and  0  denote  component-wise  multiplication  and  division,  respectively. 

The  rest  of  this  paper  is  organized  as  follows.  Section  2  overviews  one  and  two  dimensional  circulant 
correlations.  A  two-step  procedure  to  optimize  a  partial  circulant  sensing  matrix/operator  is  described  in 
Section  3.  Section  4  describes  how  the  involved  optimization  problems  are  solved.  Numerical  results  are 
presented  in  Section  5. 


2.  Overview  of  Circulant  Correlations.  Given  a  kernel  v  £  Cn,  the  circulant  correlation  x  *  v  for 
a  vector  x  £  C"  is  defined  as 

n 

(X'kv)k  =  y ^XjVkj,  for  k  =  1, . . .  ,n,  (2.1) 

i=  1 

where  ki  =  mod(n  +  *  —  k,  n)  +  1.  Let  c  =  [vn,  vn-i, . . . ,  ,ci]T.  Then  x*v  is  equivalent  to  the  convolution 
x  *  c,  which  is  defined  as 

n 

(■ X  *c)k  =  xicki  >  for  k  =  1, . . . ,  n, 

i=i 
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where  hi  =  mod(n  +  k  —  i  —  1,  n)  +  1. 

We  next  introduce  three  methods  to  compute  x  *  v.  First,  introducing  the  n  x  n  cyclic  permutation 
matrix 


we  can  write  formula  (2.1)  as 


Pn 


0  0  •••  0  1 
1  0  •••00 

0  '••  :  : 

:  0  0 
0  •■■  010 


( x  *  v)k  =  xT for  k  =  1, . . . ,  n. 


(2.2) 


Multiplying  Pn  from  the  left  circularly  down-shifts  by  a  row  and  multiplying  Pj  from  the  right  circularly 
right-shifts  a  column.  For  example,  given  v  =  [1,  2,  3]T  and  x  =  [—1,  0, 1]T,  y  =  x-kv  equals 

yi  =  xTP$v  =  —  l-l  +  0-  2  +  l-  3  =  2, 

Vi  =  xtP3v  =  —  l-3  +  0-  l  +  l-  2  =  — 1, 
y3  =  xT  P3v  =  —  l-2  +  0-  3  +  l-  l  =  — 1. 


Secondly,  we  can  compute  x  *  v  via  the  matrix-vector  multiplication  Cx  with  the  circulant  matrix 


Vi  v2  •  •  •  vn 


C  = 


Vn  Vi 


Vn- 1 


(2.3) 


V2  V3  •  •  •  Vi 


Thirdly,  x  *  v  can  be  quickly  computed  by  two  fast  Fourier  transforms  and  some  component-wise  multipli¬ 
cations  as  described  in  the  following  lemma,  which  is  a  restatement  of  the  convolution  theorem. 

Lemma  2.1.  Any  circulant  matrix  C  in  (2.3)  can  be  written  as  C  =  FDF* ,  where  F  is  the  nxn  unitary 
discrete  Fourier  transformation  matrix  and  D  =  diag(d)  where  d  =  y/nFv. 

Remark  2.1.  The  matrix  C  is  real- valued  if  d  is  conjugate  symmetric,  namely,  di  =  conj(di')  for  every 
i  and  i'  =  mod(n  —  *  + 1 ,  ra)  + 1 .  Imposing  conjugate  symmetry  leads  to  real-valued  C  and  reduces  the  freedom 
of  d  to  nearly  half. 

2D  circulant  correlation  inherits  the  nice  properties  of  ID  circulant  correlation.  Given  a  kernel  M  £ 
C n1x«2;  the  2D  circulant  correlation  Y  —  X  -k  M  for  a  given  matrix  X  G  C™lX™2  is  defined  by 


(X  *  M)ts  =  ^2  XiiMUsj  i  for  1  <  t  <  ni,  1  <  s  <  n2,  (2.4) 

i,j 

where  fj  =  mod(ni  +  *  —  t,  nf)  +  1  and  Sj  =  mod(ri2  +  j  —  s,  n2)  +  1. 

Similar  to  ID  circulant,  X  *  M  can  be  computed  in  three  ways.  First,  with  cyclic  permutation  matrices 
Pni  and  Pn2 ,  formula  (2.4)  can  be  written  as 


(. XkM)ts  =  X  •  (P^MiP^y-1),  for  1  <  t  <  nr,  1  <  a  <  n2. 


For  instance,  if 
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then  Y  =  X  *  M  has  components 
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Secondly,  Y  =  X  ★  M  can  be  computed  by  matrix-vector  multiplications  via  vec(Y)  =  C2vec(X),  where 


C2  =  [vec(M1,:L),  vec(M2,1), . . . ,  vec (AT*1-1), . . . ,  vec (M1-8), . . . ,  vec (Mni>a), . . . ,  vec(Mni-na)] 


(2.5) 


with  the  notation  :=  Pf^  1M(Pjo)s  1  for  1  <  t  <  n\  and  1  <  s  <  n2.  It  is  not  difficult  to  verify  that 
C2  is  a  block-circulant  matrix.  For  the  above  example,  we  have 


C2 


1  4  2  5  3  6 

4  1  5  2  6  3 

3  6  1  4  2  5 

6  3  4  1  5  2 

2  5  3  6  1  4 

5  2  6  3  4  1 


and  C2vec(X) 


14  2  5 

4  15  2 

3  6  14 

6  3  4  1 
2  5  3  6 

5  2  6  3 


'  1  ‘ 

-3' 

—2 

0 

-1 

-5 

0 

—2 

0 

-1 

_  1  _ 

-4 

=  vec(Y). 


Thirdly,  we  can  quickly  compute  X  *  M  through  two  fast  2D  Fourier  transforms  and  some  component-wise 
multiplications  according  to  the  following  lemma,  which  is  a  restatement  of  the  2D  convolution  theorem. 

Lemma  2.2.  Given  kernel  M  £  CniXn2,  define  2D  circulant  operator  C2  on  CniX"2  by  C2(X)  =  X  *  M 
for  X  £  Cnix"2_  ThenC2  can  be  represented  as  C2  =  where  T2  is  the  orthogonal  2D  discrete  Fourier 

transformation  operator  on  CniXn2,  Tlf  is  the  adjoint  operator  of  J-2  as  well  as  its  inverse  operator ,  and  V 
is  an  operator  on  CniX"2  defined  by  V(X)  =  V  ©  X  for  any  X  £  CniXn2  with  V  =  yJn\n2T2{M). 

Remark  2.2.  The  block-circulant  matrix  C2  corresponding  to  C2  is  real  valued  if  V  is  conjugate  sym¬ 
metric,  namely,  Vij  =  Vi'p  for  every  pair  of  i,j  and  i'  =  mod(ni  —  i  +  l,nfi)  +  l,  j'  =  mod(n2  —  j  +  l,n2)  +  l. 


3.  Learning  Circulant  Sensing  Matrix/ Operator.  In  this  section,  we  illustrate  how  to  learn  the  ID 
kernel  v  in  formula  (2.1)  and  the  2D  kernel  M  in  formula  (2.4),  as  well  as  their  corresponding  downsampling 
operators  P  (row  selection)  and  Vq  (sample  selection);  hence,  we  form  a  partial  circulant  sensing  matrix 
PC  £  Cmxn  and  a  partial  2D  circulant  sensing  operator  PqC2,  respectively.  Here,  P  selects  m  out  of  the  n 
rows  from  C,  and  Vq  collects  the  entries  in  C  {1, . . . ,  ni}  x  {1, . . . ,  n2}  and  forms  a  column  vector.  For 
example, 


X  = 


1  2 
4  5 
7  8 


3 

6 

9 


and  D  =  {(1,2),  (2,1),  (3,1),  (2, 3)}, 


lead  to  Vn(X)  =  [4,7,2, 6]T. 

3.1.  Learning  ID  circulant  kernel  and  downsampler.  Let  'f  £  CnxK  be  a  given  dictionary  such 
that  the  underlying  signal  x  =  ^9,  where  6  is  a  (nearly)  sparse  vector.  Following  [7],  we  would  like  to  design 
a  partial  circulant  sensing  matrix  $  =  PC  such  that  «  /,  where  C  is  an  n  x  n  circulant  matrix 

and  P  is  an  m  x  n  downsampling  matrix.  To  construct  a  partial  circulant  matrix  <f>,  we  shall  determine  P 
and  C.  We  first  learn  the  circulant  matrix  C  and  then  the  downsampler  P. 
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According  to  Lemma  2.1,  C  =  FDF*  where  the  diagonal  matrix  D  =  diag(ci)  with  entries  d  =  ^JnFv 
and  v  is  the  kernel  of  C .  Therefore,  learning  C  is  equivalent  to  choosing  the  best  kernel  v  or  vector  d.  Our 
approach  is  based  on  solving 

mm\\^*C*C^  -  I\\F,  (3.1) 

so  that  'L *C*C «  I.  From  F*F  =  J,  we  have 

=  '&*FD*F*FDF*'&  =  V*  Fdiag(cf  )diag(d)F*4' . 

Introduce  B  :=  F*'!r'i!*F,  which  satisfies  B*  =  B.  Let  B  :=  [|i?jj|2].  Given  T,  matrices  B  and  B  are 
constant  matrices.  Let 

*:=(|d1|Md2|2,...,|dn|2)T>0,  (3.2) 

which  are  unknowns  to  be  determined.  We  have  diag(d*)diag(d)  =  diag(x)  and 
l-\\^*C*C*  -  lfF  =  ^ || \E,*Fdiag(d* )diag(d)F*'F  -  I\\2F 

1  71 

=  -tr('P*Fdiag(x)-Bdiag(x)F*'!l/)  —  tr('P*Fdiag(x)F*'P)  +  — 

1  71 

=  -tr(diag(x)Fdiag(x)H)  —  tr(diag(x)H)  +  — 

1  71 

=  -xT  diag(Hdiag(x)F)  —  xTdiag(U)  +  — 

1  —  71 

=  -xT  Bx  —  xTdiag(R)  +  — . 

Hence,  we  reduce  (3.1)  to 

min  -xT  Bx  —  xTdiag(F).  (3.3) 

a?>0  2 

Problem  (3.3)  is  a  convex  quadratic  program  and  can  be  reformulated  as  a  non-negative  least-squares  prob¬ 
lem. 

Given  a  solution  x  =  xopt  to  (3.3),  any  d  obeying  (3.2)  gives  C  =  Fdiag(d)F*  as  a  solution  to  (3.1). 
Since  only  |dj|,  i  =  1 ,.. .  ,n,  are  specified,  the  phases  of  d,  are  still  subject  to  determine.  Generally,  phases 
encode  the  location  of  the  information  in  a  signal.  Since  such  location  is  unknown  at  the  time  of  sensing, 
we  choose  to  generate  the  phase  of  every  di  uniformly  at  random.  Models  (3.1)  and  (3.3)  lead  to  significant 
improvement  in  sensing  efficiency  as  we  shall  demonstrate  by  numerical  examples  in  Section  5. 

Remark  3.1.  The  above  procedure  generates  a  complex-valued  C.  To  obtain  a  real-valued  C,  one  shall 
add  constraints  Xi  =  xn-i+ 2  for  i  =  2, . . .  ,n  in  the  problem  (3.3)  and  then  generate  a  conjugate  symmetric 
d  from  the  solution  xopt . 

We  also  tested  an  alternative  model: 

min  i||C^-J|||.  (3.4) 

which  is  equivalent  to 

min  \d* Ad  —  ^dT  diag(F*TF)  —  ^conj(dT  diag(F*TF)),  (3.5) 

dgC"  22  2 

where  A  is  a  diagonal  matrix  with  diagonal  entries  given  by  vector  diag(F*'F’P*F).  Since  (3.5)  is  component¬ 
wise  separable  in  di ,  it  is  easy  to  derive  its  closed-form  solution 

dopt  =  conj(diag(F*'PF))  0  diag(F*'P’3/*F). 
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(3.6) 


However,  our  numerical  experience  suggests  that  this  solution  (and  thus  model  (3.4))  is  not  as  effective  as 
that  of  (3.1). 

After  the  circulant  matrix  C  is  determined,  we  now  optimize  P,  and  thus  fully  determine  *!>.  A  simple 
yet  effective  solution  is  the  random  selection,  that  is,  let  P  select  m  out  of  the  n  rows  of  C  uniformly  at 
random.  This  tends  to  work  well  on  signals  without  dominating  frequencies. 

We  can  also  choose  to  minimize  —  J||j7  so  that  4/ *$*$4/  a;  /.  Let  pi  be  the  binary  variable 

indicating  the  selection  of  element  i,  i.e. , 

!1,  matrix  P  selects  row  i , 

(3.7) 

0,  otherwise. 


Since  4>  =  PC ,  we  shall  solve 


-  I\\2f  = 


^PiiQiQi)  ~  I 


where  qi  is  the  i-th  column  of  4PC*.  Model  (3.8)  is  equivalent  to  the  binary  quadratic  program 
min  ||  [vec(gig*),  •  •  •  ,  vec(g„g*)]p  -  vec(I)\\l ,  subject  to  =  m  and  pi  G  {0,1},  Vi. 

T)  ^ J 


(3.8) 


(3.9) 


Here,  [vec(giq{),  •  •  •  ,  vec(g„g*)]  is  a  matrix  of  n2  rows  and  n  columns.  By  simple  calculation,  we  can  write 
(3.9)  into  the  following  equivalent  problem 


min pT Hp  —  2 fTp,  subject  to 

p 


y ^Pi  =  m  and  pi  G  {0, 1}, 

i 


Vi, 


(3.10) 


where  H  =  [|Gy|2],  /  =  diag(G)  and  G  =  Problem  (3.10)  is  NP-hard  in  general  and  can  be  solved 

to  a  moderate  size  by  solvers  such  as  Gurobi  [4]. 


3.2.  Learning  2D  circulant  kernel  and  downsampler.  2D  circulant  operator  C2  is  often  used  to 
sense  images  and  videos.  Given  a  dictionary  'L  =  [^l,  ^2,  •  ■  •  ,  iPk]  where  each  tpi  G  C"lXri2,  we  define  a 
linear  operator  Q  on  CK  by  Q(9 )  =  f°r  &  G  .  An  array  X  G  <CniX"2  is  sparse  with  respect  to 

the  dictionary  4/  if  it  can  be  represented  as  X  =  Q(9)  with  a  sparse  9  G  CK .  Let  Vq  be  the  downsampling 
operator  on  CniX"2  defined  at  the  beginning  of  this  section.  Then  the  basis  pursuit  problem  to  recover  X 
can  be  written  as 


min  ||0||i,  subject  to  PqC2Q(0)  =  b,  (3.11) 

0 

where  b  =  PqC2(A')  contains  the  measurements  of  A.  To  improve  the  recoverability  of  the  t\  optimization 
problem  (3.11),  we  shall  try  to  make  (C2Q)*C2Q  close  to  the  identity  operator  X  on  CK .  The  adjoint  operator 
of  Q  is  defined  as 


Q*(X)  =  KVh.X),  (tp2,  A), . . . ,  A)]T,  for  any  A  G  C"lX”2 


Hence,  for  any  6  G  CK , 


(C2Q)*C2Q(0)  = 


K 


K 


K 


T 


<e2(Vh),£  9zc2(^)),  (c2^2),J2  •  •  • ,  (c2(V>k),  E  WM) 


and  making  (C2Q)*C2Q  ~  X  is  equivalent  to  making  (C2{ipj),  Xlili  0*^2  W1*))  ~  0j  for  1  <  j  <  K.  Toward 
this  goal,  we  solve 


min\\G  -  I\\2f,  subject  to  Gts  =  (C2{ipt),  C2{ips)), 

(jr 


(3.12) 
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which  can  simplified  as  follows.  According  to  Lemma  2.2,  we  have 

(Ca(^),C2(^))  =  <7-2V7J(^),7-2VJJ(^))  =  (V7J(V«t),V7J(^))  =  (V*VJJ(V»t),7J(^)). 

Let  G  =  conj(H)  ©  H  and  define  U  :=  V*V,  where  V*  is  the  adjoint  operator  of  V.  We  have  U(X)  =  U  O  X 
for  any  X  £  C"lXn2.  Let  u  =  vec  (U)  and  Y  =  [y1  ,y2 , . . .  ,yK]  with  ys  =  vec(J?2  (ifs))  for  s  =  1,2,..  .,K. 
Then 

G*s  =  =  («©V)V  =«*(conj(yt)©i/-). 

Hence, 

||G|||  =  J>onj  (Gts)  •  Gts 

t,S 

=  Y  con.i  (M*(conj(z/*)  ©  ys ))  •  (u*(conj(3/*)  ©  ys)) 

t,S 

=  YU*  ((conj (j/*)  ©  ys)(conj(y‘)  Qys))*u 

t,S 

=  YU*  (conJ (yt(ytT)(ys{ysT)) u  =  u*Au , 

t,s 

where  A  :=  J^t  s  conj(2/*(2/t)*)(ys(2/s)*)  =  conj(YF*)  ©  (YY*).  In  addition, 

tr(G)  =  tr(G*)  =  Y  u*(conjG/)  ©  V*)  =  (diag(YY*))*w  =  f*u , 

t 

with  /  :=  diag(YY*).  Therefore,  problem  (3.12)  is  equivalent  to  the  convex  quadratic  program 

min  }-uT Au  —  fTu  (3.13) 

w> o  2 

in  the  same  form  of  (3.3),  where  we  have  used  real  transpose  instead  of  conjugate  transpose  since  f,u  and 
A  are  all  real. 

Given  a  solution  u  =  uopt  of  (3.13),  we  can  obtain  matrix  U  via  u  =  vec (U).  Any  V  satisfying 
U  =  conj(H)  ©  V  defines  C2  =  T2VT2  as  a  solution  to  (3.12).  Like  done  in  Section  3.1  for  ID  circulant,  we 
choose  to  generate  the  phase  of  each  entry  Vjs  uniformly  at  random. 

Remark  3.2.  The  above  procedure  gives  a  complex-valued  C2.  To  obtain  a  real-valued  C2,  we  can 
add  constraints  W(ni_j.)j+j  =  Wfni-ilj'+i'  for  every  pair  of  i,j  and  i'  =  mod(ni  —  i  +  l,ni)  +  1  ,j'  = 
mod(n.2  —  j  +  1,7x2)  +  1  to  (3.13),  and  then  generate  a  conjugate  symmetric  V  from  the  solution  uop t. 

Given  C2,  we  can  choose  Q  uniformly  at  random  or  optimize  Vn  toward  (PQ.C2Q)* (’PnG^Q)  ~  ©,  which 
can  be  achieved  by  solving 

mm\\G-I\\2F,  subject  to  Gts  =  'PaC2(V’s))-  (3.14) 

Gr 

Let  n  =  n\U2  and  Dq  £  C"x"  be  a  diagonal  matrix  with  diagonal  entries  defined  by 

f  1,  if  (i-  j)  £  f2, 

(-Dn)fcyfey  =  <  for  1  <  i  <  m,  1  <  j  <  n2,  (3.15) 

I  0,  otherwise, 

where  kij  =  i  +  ( j  —  l)m.  Then  (Dnvec(A'), Dnvec(F))  =  {VnX,VnY)  for  any  X,Y  £  CniXn2.  Letting 
yt  =  vec(C2(V’t))  for  t  =  1, . . . ,  K ,  we  have 


Gts  =  (VnC2(A),VnC2^s))  =  <iW,  Ail/8)  =  (y^Dny8. 


Let  Y  =  J2sys(ys)*-  Then 


||G||!  =  ^conj(Gts)  -Gts  =  J2  {(tfYDny8)  ■  (Q/TW) 

t,s  t,s 

=  Y.^YiDnYDn)^  =  YtT  (DnY Day* (»*)*)  =  tr(£>ny£>nr) 

t  t 

=  E(^raoL)ii  =  x;(i3ny)«  •  (IWii  =  £(Ai)«  •  Yj  ■  {Dn)jj  ■  Yji 

i  i,j  i,j 

=  ^(-Dn)«  '  Yj  ■  conj (YZJ)  ■  (Dn)jj  =  pT Hp, 
i,3 

where  H  =  [|Yj;jj2]  and  p  =  diag(Dn).  In  addition, 

tr(G)  =  YG"  =  YWYDnV*  =  JXAtfW)  =  to(DnY)  =  fTp , 

t  t  t 

where  /  =  diag(y).  Hence,  (3.14)  is  equivalent  to 

min  -pT  Hp  —  /Tp,  subject  to  pk  =  Ifil  and  pk  E  {0, 1},  Vfc.  (3.16) 

v  2  ' 

k 

4.  Algorithm  and  Implementation.  With  a  given  dictionary  4',  Algorithm  1  outlines  our  approach 
for  optimizing  a  partial  circulant  matrix  *1). 

Algorithm  1 

1.  Solve  (3.3)  for  x,  generate  randomly-phased  d  from  x  via  (3.2),  and  then  form  G  =  Fdia,g(d)F* . 

2.  Solve  (3.10)  for  p  or  randomly  generate  p  and  then  form  P  from  p  via  (3.7). 

3.  Generate  =  PC. 


For  2D,  an  optimized  partial  circulant  operator  VnC'2  can  be  obtained  in  a  similar  way.  At  Step  1  of 
Algorithm  1,  we  use  the  MATLAB  function  quadprog  to  solve  (3.3)  with  its  default  settings.  At  Step  2, 
we  use  the  commercial  code  Gurobi  [4]  with  MATLAB  interface  [32]  to  solve  the  binary  program  (3.10).  In 
our  test,  the  maximum  number  of  iterations  was  set  to  2000.  Usually,  Gurobi  terminated  at  the  maximum 
number  of  iterations  with  the  best  solution  obtained.  Hence,  (3.10)  was  only  approximately  solved. 

We  used  both  synthetic  and  real  data  for  test.  For  synthetic  data,  4/  was  Gaussian  randomly  generated 
basis  or  discrete  Fourier  basis,  and  4>  was  learned  by  Algorithm  1.  For  real  data,  was  learned  in  either 
an  uncoupled  way  or  a  coupled  way.  In  the  uncoupled  test,  we  first  learned  4/  from  a  set  of  training  data 
X  =  [X1,...,XL)e  CnxL  using  KSVD  [1]  to  solve 

minllX  —  ’FOIl!',  subject  to  ||0d|o  <  S,  V*  (4.1) 

\t,o 

for  a  given  sparsity  level  S ,  where  0  =  [0i, . . . ,  Ql]  and  ||0j||o  denotes  the  number  of  non-zeros  of  0;.  Then 
we  learned  $  by  Algorithm  1.  In  the  coupled  test,  we  simultaneously  learned  a  pair  of  4/  and  in  a  way 
similar  to  [7].  Specifically,  during  each  loop  of  the  coupled  algorithm,  we  first  compute  $  via  Algorithm  1 
with  a  fixed  4>,  then  update  0  by  solving 

mina||A  —  4'0|||.  +  |[$A  —  $'F0||^,  subject  to  ||0i||o  <  S  (4.2) 

with  respect  to  0,  where  a  >  0  is  a  weight  parameter,  and  finally  update  4t  and  0  via  solving  (4.2)  with 
respect  to  and  the  current  nonzero  entries  of  0  jointly. 
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Table  5.1 

List  of  tested  sensing  matrices  and  2D  operators* 


name 

type 

real/complex 

kernel 

downs  ampler 

dimension 

rand-circ 

circulant 

complex 

random 

random 

ID 

real-rand- circ 

circulant 

real 

random 

random 

ID 

Gaussian 

Gaussian 

complex 

random 

ID 

real-  Gaussian 

Gaussian 

real 

random 

ID 

opt-circ 

circulant 

complex 

optimized 

random 

ID 

real-opt-circ 

circulant 

real 

optimized 

random 

ID 

opt- circ-  and-  P 

circulant 

complex 

optimized 

optimized 

ID 

real-  opt-circ- and-P 

circulant 

real 

optimized 

optimized 

ID 

opt-plus-rand-circ 

circulant 

complex 

optimized+random' 

random 

ID 

coupled-plus-rand- circ 

circulant 

complex 

optimized+random 

random 

ID 

rand-2D-circ 

circulant 

complex 

random 

random 

2D 

opt-plus-rand-2D-circ 

circulant 

complex 

optimized+random 

random 

2D 

*The  kernel  v  for  real  random  ID  circulant  was  generated  by  MATLAB  command  randn(n,l)  and  that  for  complex  ran¬ 
dom  ID  circulant  by  randn(n,  l)+li*randn(n,  1) ;  real  Gaussian  was  generated  by  randn(m,n)  and  complex  Gaussian  by 
randn(m,n)+li*randn(m,n) ;  the  kernel  M  for  real  random  2D  circulant  was  generated  by  randn(nl,n2)  and  that  for  com¬ 
plex  random  2D  circulant  by  randn(nl  ,n2)+li*randn(nl  ,n2) ; 

T60%  of  the  normalized  optimized  circulant  plus  40%  of  a  normalized  real  random  circulant. 


After  obtaining  ^  and  <f>,  we  used  YALL1  [28]  (version  1.4)  to  recover  the  sparse  signal  6  via  solving 

min  ||0||i  +  -  ||4>\E'0  —  6|||,  (4.3) 

e  p 

where  b  =  4>\E'0  +  ry  was  the  measurement  contaminated  by  Gaussian  noise  r)  ~  Af( 0,  al)  and  p  >  0  was  a 
parameter  corresponding  to  a.  Throughout  our  tests,  a  was  known,  and  the  parameter  p  was  set  to  a.  If 
p  =  0,  problem  (4.3)  reduces  to 


min  1 1 0 1 1 i ,  subject  to  $4/0  =  b.  (4-4) 

6 

The  stopping  tolerance  for  YALL1  was  set  to  10-5  for  noiseless  case  and  10-4  for  noise  case,  unless  specified 
otherwise.  The  maximum  number  of  iterations  was  set  to  104. 

5.  Numerical  Simulations.  We  compared  the  performance  of  optimized  circulant  sensing  matri¬ 
ces/operators  to  that  of  random  ones  on  both  synthetic  data  and  real-world  data.  In  addition,  we  tested  a 
pair  of  circulant  sensing  matrix  >I>  and  dictionary  T  learned  together  on  real-world  data.  The  tested  sensing 
matrices  and  2D  operators  are  listed  in  Table  5.1.  To  be  fair,  except  for  the  coupled  learned  circulant  matrix 
coupled-plus-rand-  circ,  all  sensing  matrices  or  2D  operators  were  generated  and  tested  with  the  same  set  of 
synthetic  or  learned  dictionaries. 

Throughout  our  tests,  all  columns  of  T  and  $  were  normalized  to  have  the  unit  2-norm.  The  normal¬ 
ization  of  4/  is  only  for  convenience  while  that  of  $  is  critical,  for  otherwise  the  recovery  by  YALL1  or  other 
solvers  may  become  unstable.  Similarly,  we  normalized  all  partial  2D  circulant  operators.  We  next  introduce 
an  efficient  way  to  normalize  ID  matrix  PC  and  2D  operator  VqC2- 

5.1.  Normalization  of  ID  matrix  PC  and  2D  operator  VnC2 -  Let  C  be  defined  in  (2.3)  and 
c  =  [ui,  w„,  vn-\, . . . ,  V2]t  be  its  first  column.  It  is  easy  to  verify  that  the  jth  column  of  C  is  Cj  = 
[cn-j+ 2,  cn~j+ 3, . . . ,  cn,  Ci, . . . ,  cn_j_|_i]T.  Let  £  =  [£i, . . . ,  £n]T  with  £j  being  the  Euclidean  norm  of  the  jth 
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column  of  PC.  Recalling  (3.7),  we  have 


n 


where  =  mod(n  +  i  —  j,  n)  +  1.  Hence,  ij  =  \jY^'i.=iPic2sip  f°r  j  =  1  and  PC(diag(£))  1  has 

columns  with  the  unit  2-norm. 


For  VnC2,  recall  (2.5)  and  (3.15).  Then  normalizing  VqC2  is  equivalent  to  normalizing  DqC2.  Let 
n  =  nin2  and  £  =  \£\, . . . ,  £n]T  with  i3  being  the  Euclidean  norm  of  the  jth  column  of  DqC2.  Note  that  for 
1  <  i,  r  <  rii  and  1  <  s,  k  <  772, 


where  its  =  i+(s  — l)ni,  jTK  =  t  +  (k  —  1)tii  and  atT  =  mod(77i+r  — i, ni)  +  l,/3SK  =  mod(772  +  K  —  8,772)  +  1. 
Hence,  for  each  pair  of  (r,  k),  we  can  compute  £j  with  j  =  r  +  (k  —  l)?7i  via 


Therefore,  we  normalize  DqC2  to  DqC2{ diag(£))  1  and  VqC2  to  Vq.C2C,  where  £(X)  =  X  0  L  for  any 
X  £  Cnixr»2  with  L  €  C"lXn2  obtained  from  £  via  l  =  vec (L). 


5.2.  Synthetic  data.  We  tested  different  optimized  circulant  sensing  matrices  $  on  synthetic  data 


along  with  two  kinds  of  bases  ’Ll  the  Gaussian  random  basis  and  the  discrete  Fourier  basis.  The  per¬ 
formance  was  compared  to  that  of  unoptimized  random  circulant,  as  well  as  random  Gaussian,  sensing 
matrices.  The  Gaussian  random  basis  was  generated  by  MATLAB  command  randn(n)  and  then  turned  to 
an  orthogonal  matrix  by  QR  decomposition,  and  the  Fourier  basis  was  generated  by  MATLAB  command 
dftmtx(n)/sqrt(n).  Both  bases  were  512  x  512,  and  the  sensing  matrices  had  the  size  64  x  512,  i.e. ,  an 
8x  downsample.  In  this  test,  9  was  generated  with  k  randomly  located  nonzero  entries  sampled  from  the 
standard  Gaussian  distribution  and  then  normalized  to  have  the  unit  2-norm.  The  stopping  tolerance  for 
YALL1  was  set  to  5  x  10-8.  Figure  5.1  depicts  the  comparison  results  of  sensing  matrices:  rand-circ ,  Gaus¬ 
sian ,  opt-circ ,  opt-circ-and-P ,  as  well  as  their  real-valued  counterparts.  We  call  a  recovery  9*  successful  if 
the  relative  error  ||0*  —  0||2/||0||2  was  below  10-4.  We  calculated  the  success  rate  out  of  50  independent  trials 
at  every  sparsity  level  k. 

All  the  four  subfigures  of  Figure  5.1  reveal  that  optimized  circulant  sensing  matrices  led  to  equally  good 
performance  as  random  Gaussian  matrices.  For  the  random  basis,  random  circulant  matrices  achieved  sim¬ 
ilar  recovery  success  rate,  while  they  performed  extremely  bad  for  the  discrete  Fourier  basis.  The  reason 
for  the  bad  performace  of  random  circulant  matrices  can  be  found  in  [30].  On  the  other  hand,  optimizing 
the  selection  matrix  P  hardly  made  further  improvement.  We  believe  that  unless  the  underlying  signal  has 
dominant  frequencies,  optimizing  the  selection  matrix  P  will  not  lead  to  consistent  improvement.  Therefore, 
in  the  subsequent  tests,  we  let  P  be  random  selection  matrices.  In  addition,  the  improvement  by  optimiz¬ 
ing  circulant  sensing  matrices  over  randomly  generated  ones  is  similar  for  real  and  complex- valued.  For 
convenience,  we  use  complex- valued  sensing  matrices/operators  in  the  rest  tests. 

Although  the  tests  above  are  based  on  synthetic  signals  that  are  exactly  sparse,  similar  performance  of 
different  sensing  matrices  was  observed  on  those  that  are  nearly  sparse.  Since  real  images  are  nearly  sparse 
(under  certain  dictionaries),  we  skip  presenting  the  results  of  nearly  sparse  synthetic  signals  and  proceed  to 
the  real  image  tests. 
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Fig.  5.1.  Comparison  results  of  two  groups  of  sensing  matrices:  rand-circ,  Gaussian,  opt-circ,  opt-circ-and-P,  and  their 
real  valued  counterparts  with  Gaussian  random  basis  (left)  and  Fourier  basis  (right). 


5.3.  Image  tests  with  circulant  matrices.  In  this  subsection,  we  present  the  performance  of  sensing 
matrices  rand-circ ,  Gaussian ,  opt-plus-rand-circ  and  coupled-plus-rand- circ  of  size  64  x  256  on  real  images.2 
The  dictionaries  for  all  of  them  were  learned  from  the  same  training  set.  The  first  three  sensing  matrices 
shared  the  same  set  of  dictionaries  learned  by  KSVD  [1]  with  sparsity  level  S  =  6,8  in  (4.1),  and  the  last  one 
was  learned  simultaneously  with  its  corresponding  dictionary  by  the  coupled  method  described  in  Section 
4  with  sparsity  level  S  =  6,8  and  a  =  A  in  (4.2).  This  choice  of  a  was  recommended  in  [7].  All  images 
used  in  these  tests  were  scaled  to  have  the  unit  maximum  pixel  value.  The  training  data  consists  of  20,000 
8x8  patches,  that  are  100  randomly  extracted  patches  from  each  of  the  200  images  in  the  training  set  of 
the  Berkeley  segmentation  dataset  [17].  The  100  images  in  the  testing  set  were  used  to  measure  the  recovery 
performance. 

In  the  first  set  of  tests,  we  uniformly  randomly  extracted  non-overlapping  600  patches  from  the  100 
test  images  to  recover  using  their  measurements  obtained  with  the  four  sensing  matrices.  Figure  5.2  depicts 
the  mean  squared  error:  MSE  =  J2i=i  ll3^  ~~  4/0I|||/(64£),  for  sparsity  level  S  =  6,8  and  measurement  size 
m  =  16,24.  Here,  xl  is  the  vector  obtained  by  reshaping  the  ith  selected  patch,  l  =  600  is  the  number  of 
tested  patches,  and  9l  was  the  YALL1  solution  of 


min  || 0 1| i  H — ||4>T0  —  5||2,  (5.1) 

9  a 

where  b  =  §xl  +  77,  77  ~  Af(0,al)  is  Gaussian  noise  and  a  =  d'||$x*||00/||r?||00  with  a  taking  values 
0,0.01,0.05,0.10,0.15.  All  results  are  the  averages  of  20  independent  trials. 


2We  also  tested  opt-circ  and  coupled-circ ,  and  found  that  they  performed  as  well  as  opt-plus-rand-circ  and  coupled-plus- 
rand-circ  on  average.  However,  they  tend  to  cause  the  loss  of  small  image  features  that  may  not  be  well  described  by  the 
dictionary. 
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Fig.  5.2.  Comparison  results*  of  four  different  sensing  matrices:  rand-circ,  Gaussian,  opt-plus-rand-circ  and  coupled- 
plus-rand-circ  on  Berkeley  segmentation  dataset  for  different  KSVD  sparsity  levels  S  =  6,8  and  different  sampled  row  numbers 
m  =  16,  24. 


NB  rand-circ 

'HttMl  rand-circ 

S _ ,  Gaussian 

! _ ;  Gaussian 

i _ J  opt-plus-rand-circ 

i _ j  opt-plus-rand-circ 

■■■  coupled-plus-rand-circ 

■■■  coupled-plus-rand-circ 

(a)  m  =  16,  S  =  6 


(b)  m  =  24,  S  =  6 


MHR  rand-circ 

VHMR  rand-circ 

[ _ [  Gaussian 

[ _ ;  Gaussian 

l _ j  opt-plus-rand-circ 

l _ j  opt-plus-rand-circ 

coupled-plus-rand-circ 

coupled-plus-rand-circ 

(c)  m  =  16,  S  =  8  (d)  m  =  24,  S  =  8 

‘Some  MSEs  are  out  of  the  display  range  and  they  are  larger  than  0.02. 


All  the  four  pictures  in  Figure  5.2  reveal  that  both  uncoupled  and  coupled  learning  approaches  achieved 
significantly  better  recovery  over  random  circulant  matrices,  and  they  did  even  better  than  Gaussian  random 
matrices  when  m  =  16.  The  coupled  learning  approach  made  slight  improvement  over  the  uncoupled  one. 

In  the  second  set  of  tests,  we  chose  two  out  of  the  100  test  images  to  recover  using  their  measurements 
obtained  with  the  four  different  sensing  matrices.  The  first  image  had  the  resolution  of  481  x  321,  and  the 
second  one  had  the  resolution  of  321  x  481.  For  convenience,  we  removed  the  last  row  and  the  last  column 
of  both  the  two  images  and  partitioned  each  images  into  1,200  8  x  8  patches.  Then  we  used  YALL1  to 
solve  (5.1)  for  each  patch  with  the  dictionary  learned  by  KSVD  with  the  sparsity  level  S  =  6.  We  used 
peak-signal-to-noise-ratio  (PSNR)  and  mean  squared  error  (MSE)  to  measure  the  performance  of  recovery. 
Table  5.2  lists  the  average  results  of  20  independent  trials  for  measurement  size  m  =  16,  24  and  noise  level 
a  =  0,0.01,0.05,0.10,0.15.  One  set  of  the  recovered  images  are  shown  in  Figure  5.3.  From  the  results,  we 
can  see  that  both  uncoupled  and  coupled  learning  approaches  made  significant  improvement  over  random 
circulant  matrices,  especially  for  m  =  16.  When  a  >  0.10  or  m  =  16,  the  coupled  learning  approach  did 
even  better  than  random  Gaussian  matrices.  Coupled  learned  circulant  matrices  tend  to  do  better  than 
Gaussian  random  ones  when  m  is  smaller.  Although  the  best  PSNRs  were  achieved  with  Gaussian  random 
matrices,  the  best  visual  results  are  those  with  the  coupled  learned  circulant  matrices.  In  addition,  the 
coupled  approach  did  slightly  better  than  the  uncoupled  one. 
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Table  5.2 

Comparison  results  of  recovered  images  Castle  and  Gulf  by  YALL1  with  four  different  sensing  matrices:  rand-circ,  Gaus¬ 
sian,  opt-plus-rand-circ,  coupled-plus-rand-circ 


Castle 

rand-circ 

Gaussian 

opt- plus- rand-  circ 

coupled-plus-rand-circ 

m 

<7 

PSNR 

MSE 

PSNR 

MSE 

PSNR 

MSE 

PSNR 

MSE 

16 

0.00 

19.28 

4.81e-002 

25.60 

2.92e-003 

25.03 

3.16e-003 

25.31 

2.96e-003 

16 

0.01 

18.60 

4.70e-002 

24.39 

3.83e-003 

24.46 

3.62e-003 

24.76 

3.36e-003 

16 

0.05 

18.48 

4.45e-002 

24.09 

4.08e-003 

24.01 

4.01e-003 

24.36 

3.68e-003 

16 

0.10 

18.12 

4.53e-002 

23.17 

5.08e-003 

23.37 

4.63e-003 

23.70 

4.28e-003 

16 

0.15 

17.59 

4.79e-002 

22.26 

6.25e-003 

22.78 

5.30e-003 

23.08 

4.93e-003 

24 

0.00 

25.65 

8.57e-003 

28.76 

1.33e-003 

27.03 

2.00e-003 

27.43 

1.82e-003 

24 

0.01 

24.25 

7.08e-003 

27.29 

1.88e-003 

26.26 

2.45e-003 

26.67 

2.18e-003 

24 

0.05 

23.99 

7.90e-003 

26.63 

2.19e-003 

25.56 

2.86e-003 

25.97 

2.56e-003 

24 

0.10 

23.37 

9.12e-003 

25.51 

2.82e-003 

24.66 

3.49e-003 

25.08 

3.13e-003 

24 

0.15 

22.70 

1.07e-002 

24.41 

3.65e-003 

23.90 

4.13e-003 

24.30 

3.74e-003 

Gulf 

rand-circ 

Gaussian 

opt- plus- rand-  circ 

coupled-plus-rand-circ 

m 

<7 

PSNR 

MSE 

PSNR 

MSE 

PSNR 

MSE 

PSNR 

MSE 

16 

0.00 

19.53 

4.28e-002 

25.61 

2.93e-003 

25.13 

3.11e-003 

25.36 

2.93e-003 

16 

0.01 

18.96 

4.15e-002 

24.56 

3.69e-003 

24.63 

3.48e-003 

24.90 

3.26e-003 

16 

0.05 

18.81 

3.93e-002 

24.22 

3.99e-003 

24.22 

3.83e-003 

24.54 

3.54e-003 

16 

0.10 

18.51 

4.05e-002 

23.49 

4.70e-003 

23.59 

4.42e-003 

23.94 

4.06e-003 

16 

0.15 

18.13 

4.20e-002 

22.60 

5.71e-003 

23.00 

5.05e-003 

23.34 

4.66e-003 

24 

0.00 

25.96 

7.67e-003 

29.06 

1.24e-003 

27.38 

1.85e-003 

27.58 

1.76e-003 

24 

0.01 

24.60 

6.69e-003 

27.73 

1.70e-003 

26.68 

2.22e-003 

26.95 

2.04e-003 

24 

0.05 

24.33 

7.23e-003 

27.05 

1.98e-003 

26.01 

2.59e-003 

26.29 

2.38e-003 

24 

0.10 

23.63 

8.46e-003 

25.94 

2.56e-003 

25.08 

3.18e-003 

25.43 

2.90e-003 

24 

0.15 

23.12 

9.54e-003 

24.85 

3.30e-003 

24.27 

3.80e-003 

24.66 

3.45e-003 

5.4.  Image  tests  with  2D  circulant.  For  the  tests  in  Section  5.3,  each  selected  patch  was  reshaped 
to  a  vector  by  stacking  its  columns.  In  this  subsection,  we  compare  rand-2D-circ  and  opt-plus-rand-2D-circ 
to  illustrate  how  to  directly  apply  2D  circulant  operator  on  the  squared  patches.  The  dictionary  learned 
by  KSVD  in  Section  5.3  with  the  sparsity  level  S  =  6  was  used  in  this  test.  Note  that  each  atom  in  the 
dictionary  becomes  a  squared  patch  as  opposed  to  a  vector.  We  used  the  same  600  patches  and  the  same 
two  images  as  in  Section  5.3  for  this  test.  Instead  of  solving  (5.1),  now  YALL1  was  employed  to  solve 

min\\e\\1  +  -\\VnC2CQ(e)-b\\22,  (5.2) 

o  a 

where  VqCzC  denotes  normalized  partial  2D  circulant  operator,  Q  is  defined  in  Section  3.2,  b  =  VqC2C(X1)  + 
r)  with  the  same  ?y  and  a  to  those  in  Section  5.3,  and  X1  is  the  ith  tested  patch  corresponding  to  the 
reshaped  vector  xl  in  (5.1).  Figure  5.4  plots  the  average  MSEs  of  20  independent  trials  with  the  number  of 
measurements  m  =  16,  24  and  noise  level  <t  =  0, 0.01, 0.05, 0.10, 0.15,  and  Figure  5.5  plots  one  set  of  recovered 
images  Castle  and  Gulf  by  YALL1  with  the  number  of  measurements  m  =  24  and  noise  level  cf  =  0.01.  We 
can  see  that  similar  results  were  obtained  as  in  Section  5.3  where  circulant  sensing  matrices  were  compared. 

6.  Image-scale  recovery.  In  previous  tests,  we  divided  an  image  into  a  number  of  small  patches  and 
recovered  each  patch  from  its  linear  measurements,  since  the  learned  circulant  sensing  matrices/operators 
need  to  match  those  dictionary  atoms  in  size.  In  practice,  we  often  take  measurements  directly  from  the 
entire  image  instead  of  its  small  patches.  For  image  recovery  to  go  from  the  patch  scale  to  the  image  scale, 
one  approach  is  to  make  an  image-scale  dictionary  under  which  the  underlying  image  is  sparse.  In  this 
section,  we  introduce  how  to  generate  an  image-scale  dictionary  based  on  patches,  and  we  compare  two 
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Fig.  5.3.  One  set  of  recovered  images  Castle  (upper  four)  and  Gulf  (lower  four)  by  YALL1  with  four  different  sensing 
matrices:  rand-circ,  Gaussian,  opt-plus-rand-circ,  coupled-plus-rand-circ  for  noise  level  a  =  0.01,  KSVD  sparsity  level  S  =  6 
and  selected  row  number  m  =  24 


rand-circ 
PSNR— 26.78 


Gaussian 

PSNR=28.30 


opt-plus-rand-circ 

PSNR=26.85 


coupled-plus-rand-circ 
PSNR— 27.86 


rand-circ ,  PSNR=27.41 


Gaussian,  PSNR=28.90 


opt-plus-rand-circ,  PSNR=27.50 


coupled-plus-rand-circ,  PSNR=28.13 


different  partial  2D  circulant  operators  based  on  the  generated  image-scale  dictionary. 

Note  that  fully  random  sensing  operators  are  out  of  the  game  at  the  image  scale  since  the  number  of  free 
entries  of  each  is  at  least  ten  thousand  times  of  the  image  resolution,  making  such  an  operator  impossible 
to  fit  in  the  memory  of  a  typical  workstation. 

Let  X  £  CN ixiv2  ]-,e  an  image,  and  assume  that  each  of  its  small  patches  x  £  Cn lXn2,  where  ni  <  Ni  and 
n-2  <  N2,  has  a  sparse  representation  under  the  dictionary  d*0  =  . . .  ,iPk]-  We  want  to  recover  X  from 

the  measurements  b  =  G(X)  +  if,  where  Q  =  VQ.C2C  is  a  normalized  partial  2D  circulant  operator  defined  on 
CN'*N*  and  if  is  Gaussian  noise. 

First,  we  construct  an  image-scale  dictionary  IF  from  'I'0.  Let  nr  =  ,  nc  =  \_^\ ,  sr  =  Ni—nrn\,sc  = 

N2  —  ncn 2,  and  N  =  nrnc  +  sign(sr)nc  +  sign(sc)nr  +  sign(srsc).  For  each  atom  ip^,  we  form  N  image-scale 
atoms  'Ll,  'Fji . . . ,  £  £Nxxn2  ag  f0u0ws  por  j  —  1, . . . ,  nc  and  i  =  1, . . . ,  nr,  let  t  =  ( j  —  1  )nr  +  i  and  tk* 
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Fig.  5.4.  Comparison  results*  of  solutions  by  YALL1  with  two  different  2D  circulant  operators:  rand-2D-circ  and  opt- 
plus-rand-2D-circ  from  m  =  16  (left)  and  m  =  24  (right)  measurements 


MMMI  rand-2D-circ 

MHH  rand-2D-circ 

■■■  opt-plus-rand-2D-circ 

■■■  opt-plus-rand-2D-circ 

*Some  MSEs  are  out  of  the  display  range  and  larger  than  0.02  or  even  0.04. 


Fig.  5.5.  One  set  of  recovered  images  Castle  (left  two)  and  Gulf  (right  two)  by  YALL1  using  rand-2D-circ  and  opt-plus- 
rand-2D-circ  with  m  =  24  measurements  and  noise  level  a  =  0.01 


rand-2D-circ ,  PSNR  =  25.86 


be  the  matrix  with  all  elements  being  zero  except  having  'ipk  on  the  submatrix  indexed  by  the  (nr  (i—  1)  +  l)th 
through  (nr*)th  row  and  the  (nc(j  —  1)  +  l)th  through  (ncj) th  column.  If  sr  ^  0,  let  t  =  nrnc  +  j  and  tt* 
be  the  matrix  with  all  elements  being  zero  except  having  the  last  sr  rows  of  tpk  on  the  submatrix  indexed 
by  the  (nrni  +  l)th  through  TVith  row  and  the  (■ nc(j  —  1)  +  l)th  through  (ncj) th  column  for  j  =  1, . . . ,  nc. 
If  sc  ^  0,  nr  more  atoms  are  formed  in  a  similar  way  using  the  last  sc  columns  of  ipk-  If  scsr  7^  we  form 
the  last  atom  with  all  elements  being  zero  except  having  the  right-bottom  sr  x  sc  submatrix  of  on 
its  right-bottom  sr  x  sc  submatrix.  Note  that  any  two  atoms  generated  from  %l>k  have  no  common  non-zero 
parts.  The  image  X  has  a  sparse  representation  under  \1/  since  each  patch  of  X  is  sparse  under  T0.  Figure 
6  illustrates  how  we  form  N  image-scale  atoms  Tl7  T2, . . . ,  $jv  from  a  small  atom  ip. 

Then,  we  can  recover  X  from  the  measurements  b  =  G(X)  +  rj  by  solving 

minWeih  +  ^\\GQ{9) -b\\l,  (6.1) 
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Fig.  6.1.  Form  N  image-scale  atoms  from  a  small  patch  ip. 
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Fig.  6.2.  One  set  of  recovered  images  Plane  (left  two)  and  Gulf  (right  two)  by  YALL1  with  two  different  2D  circulant 
operators:  rand-2D-circ  and  opt-plus-rand-2D-circ  for  sample  ratio  SR  =  0.30  and  noise  level  a  =  0.01 


rand-2D-circ 
PSNR  =  29.59 


opt-plus-rand-2D-circ 
PSNR  =  30.51 


rand-2D-circ 
PSNR  =  26.60 


opt-plus-rand-2D-circ 
PSNR  =  27.19 


where  Q  is  a  2D  operator  defined  in  the  same  way  as  in  Section  3.2,  r/  ~  A/"(0,  er I)  is  Gaussian  noise  and 
a  =  <j||t/(X)||00/||r?||00.  At  first  glance,  the  recovery  by  solving  (6.1)  may  take  a  lot  of  time  since  the 
dictionary  \F  has  so  many  atoms.  However,  the  structure  of  these  atoms  enables  even  faster  recovery  than 
that  using  patch-size  dictionaries. 

We  compared  rand-2D-circ  and  opt-plus-rand-2D-circ  on  two  images.  Both  of  them  had  the  resolution 
of  241  x  129,  and  they  were  obtained  by  cropping3  their  larger  originals  Plane  and  Gulf  in  the  testing  set 
of  Berkeley  segmentation  dataset.  The  noise  level  a  was  set  to  0,0.01,0.05,0.10,0.15  and  the  sample  ratio 
SR  :=  to  0.20,0.30.  Table  6.1  lists  the  average  results  of  20  independent  trials,  and  Figure  6.2  plots 
one  set  of  recovered  images  Plane  (left  two)  and  Gulf  (right  two)  for  a  =  0.01  and  SR  =  0.30.  From  the 
table,  we  can  see  that  the  optimized  circulant  operators  did  significantly  better  than  the  random  ones  on 
average. 
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able  suggestions.  The  authors  also  thank  Dr.  Ming  Yan  for  his  comments  on  complex-valued  random  sensing 
matrices.  The  authors’  work  has  been  supported  in  part  by  AFOSR,  ARL  and  ARO,  DARPA,  NSF,  and 
ONR. 

3The  optimized  2D  circulant  operator  was  obtained  by  the  method  discussed  in  Section  3.2,  and  it  took  us  quite  a  long 
time.  The  learning  on  larger  images  would  take  much  more  time.  To  save  time,  we  simply  used  the  cropped  images  to  test  our 
idea. 


17 


Table  6.1 

Comparison  results  of  recovered  full-size  images  by  YALL1  with  two  different  2D  circulant  operators:  rand-2D-circ  and 
opt-plus-rand-2D-circ;  SR=sample  ratio 


Plane 

rand-2D-circ 

opt-plus-rand-2D-circ 

Plane 

rand-2D-circ 

opt-p  lus- rand-  2D- circ 

SR 

(J 

PSNR 

MSE 

PSNR 

MSE 

SR 

CT 

PSNR 

MSE 

PSNR 

MSE 

0.20 

0.00 

21.16 

4.58e-002 

26.68 

2.18e-003 

0.30 

0.00 

27.75 

2.33e-002 

30.17 

9.63e-004 

0.20 

0.01 

21.35 

4.30e-002 

26.60 

2.21e-003 

0.30 

0.01 

27.95 

2.16e-002 

30.13 

9.73e-004 

0.20 

0.05 

20.62 

4.27e-002 

25.16 

3.07e-003 

0.30 

0.05 

26.55 

2.17e-002 

27.95 

1.61e-003 

0.20 

0.10 

19.32 

4.58e-002 

23.13 

4.92e-003 

0.30 

0.10 

24.56 

2.38e-002 

25.43 

2.90e-003 

0.20 

0.15 

18.15 

4.96e-002 

21.56 

7.06e-003 

0.30 

0.15 

22.93 

2.63e-002 

23.58 

4.47e-003 

Gulf 

rand-2D-circ 

opt-plus-rand-2D-circ 

Gulf 

rand-2D-circ 

opt-p  lus- rand-  2D-  circ 

SR 

(J 

PSNR 

MSE 

PSNR 

MSE 

SR 

(J 

PSNR 

MSE 

PSNR 

MSE 

0.20 

0.00 

20.60 

1.92e-002 

24.50 

3.57e-003 

0.30 

0.00 

25.45 

8.53e-003 

27.10 

1.95e-003 

0.20 

0.01 

20.64 

1.90e-002 

24.42 

3.63e-003 

0.30 

0.01 

25.52 

8.41e-003 

27.11 

1.94e-003 

0.20 

0.05 

20.43 

1.93e-002 

24.05 

3.95e-003 

0.30 

0.05 

25.07 

8.82e-003 

26.49 

2.24e-003 

0.20 

0.10 

19.85 

2.05e-002 

23.20 

4.80e-003 

0.30 

0.10 

24.07 

9.84e-003 

25.24 

3.00e-003 

0.20 

0.15 

19.22 

2.20e-002 

22.32 

5.89e-003 

0.30 

0.15 

23.11 

1.10e-002 

24.11 

3.90e-003 
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